Computational Psychiatry
● Ubiquity Press, Ltd.
All preprints, ranked by how well they match Computational Psychiatry's content profile, based on 12 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Wirth, L. A.; Sadedin, N.; Meder, B.; Schad, D. J.
Show abstract
BackgroundPavlovian responding is a core component of behavior and can be measured via Pavlovian-instrumental transfer (PIT), where Pavlovian responses bias instrumental actions. Standard single-lever PIT paradigms, which assess responses using a single-choice option, cannot dissociate the contribution of model-free versus model-based reinforcement learning. While indirect evidence suggests a role for model-free responding in single-lever PIT, the contribution of model-based strategies is unclear. It also remains unknown whether internal cognitive states, such as mind wandering, impair specifically model-based but not model-free PIT, as is theoretically expected. MethodsWe developed a novel, trial-by-trial two-stage PIT paradigm designed to computationally dissociate model-free and model-based Pavlovian responding by leveraging probabilistic state transitions and trial-wise outcome predictions. After each two-stage Pavlovian learning trial, participants performed a single-lever PIT trial as well as a query trial of explicit value judgment. Detailed task instructions were provided to support potential model-based strategies. Computational modeling was used to quantify individual learning strategies. We assessed mind-wandering questionnaires and thought probes. ResultsAnalysis of query and PIT trials revealed trial-by-trial updating of outcome expectations based on probabilistic task structure, consistent with model-based Pavlovian responding. Behavioral responses during PIT were best explained by a computational model-based reinforcement learning model. In contrast, we found little evidence for model-free Pavlovian responding. Higher levels of mind wandering were associated with reduced model-based control but did not impact model-free indices. ConclusionWe introduce a novel single-lever PIT paradigm that enables fine-grained dissociation of model-free versus model-based Pavlovian response systems. Our findings provide evidence that single-lever PIT can operate through model-based mechanisms, challenging the assumption that single-lever PIT is predominantly model-free. Our findings also indicate that internal attentional states selectively modulate model-based PIT. Given the involvement of Pavlovian responding in numerous psychiatric conditions, our paradigm offers new avenues for understanding maladaptive behavior. Author SummaryOur daily actions are often influenced by cues like the smell of food or the sound of phone notifications that signal potential rewards or losses. These Pavlovian cues can shape our instrumental behavior even though their outcomes do not depend on what we do - a process known as Pavlovian-instrumental transfer (PIT). Here we study the computational learning mechanisms that underlie such PIT effects. While it is often assumed that Pavlovian responding follows simple, automatic rules without a cognitive model of cue consequences (i.e., model-free), evidence also shows a role for cognitive anticipations in Pavlovian responding (i.e., model-based). In this study, we extend this evidence by showing that PIT responding can be driven by flexible model-based learning. We designed a task to test whether participants use model-free versus model-based strategies to guide PIT, providing detailed task instructions. Using reinforcement learning models, we found that most participants used model-based learning when forming cue-outcome associations. Importantly, peoples attention mattered: when they were more distracted and doing mind wandering, they relied less on model-based strategies. Our findings suggest that Pavlovian learning is complex, flexible, and influenced by internal mental states, opening new windows to understand decision-making problems in mental health conditions like addiction.
Laessing, P.; Karvelis, P.; Kennedy, J.; Zai, C.; Dayan, P.; Diaconescu, A.
Show abstract
Pavlovian "approach or avoid" impulses are critical behavioral biases that, in excess, are linked to multiple psychiatric conditions. To investigate how such biases contribute to suicidal thoughts and behaviors, we analyzed data from two clinical populations completing an aversive Go/NoGo task. This task disentangles motor action (Go or NoGo) from outcome valence (escape from, or avoidance of, an aversive stimulus), enabling the isolation of Pavlovian biases from instrumental learning processes. We compared multiple computational models that had previously been proposed to explain Pavlovian tendencies, including reinforcement learning, active inference, and drift diffusion-based approaches. We employed a hierarchical Bayesian inference procedure that treats model identity as a random factor at the individual level, allowing an unbiased determination of which mechanisms most accurately captured participants behavior. Across both datasets, models featuring Pavlovian context biases plus a value-decay mechanism best accounted for performance. By contrast, policy-based Pavlovian models and more complex approaches, such as those integrating working memory or active inference, were supported by fewer study participants. These findings suggest that reflexive biases exert a persistent influence on decision-making, and that value decay plays a critical role in shaping behavior over time. Our results demonstrate the importance of systematically comparing and accounting for relevant cognitive processes to explain observed task behaviors. Understanding the factors contributing to task performance may help clarify how Pavlovian tendencies relate to psychopathology, including, in our case, elevated suicide risk. Finally, we illustrate how a complete hierarchical model selection framework can be applied to identify the most plausible mechanisms underlying Pavlovian biases, offering a robust approach for advancing our understanding of task behaviors and establishing clinical utility in future studies. Author summaryAutomatic "approach or avoid" reactions shape behavior, particularly in stressful or negative situations. In this study, we explored how these reflex-like tendencies might contribute to suicidal thoughts and behaviors. Two clinical groups completed a computerized task measuring responses to unpleasant sounds. Participants made either active responses (pressing a button to stop a sound) or passive responses (refraining from pressing to avoid starting a sound), allowing us to examine the interplay of automatic impulses and learning from past experiences. Our analysis showed that behavior was best explained by a model combining stable "approach or avoid" impulses with a forgetting process that reduced reliance on past experiences over time. More complex models involving strategies or memory-based control were less effective. These findings suggest that individuals with suicidal tendencies may rely on persistent reflex-like behaviors and over-index recent outcomes, compromising their ability to learn in uncertain environmental conditions. Understanding these cognitive processes provides insights into why some individuals feel trapped in harmful patterns of thought and behavior. Our work highlights how identifying shared traits in clinical populations using model-based methods can inform targeted mental health interventions and improve our understanding of cognitive functioning across disorders.
Kim, Y.; Brandt, L.; Cheung, K.; Nunes, E. V.; Roll, J.; Luo, S. X.; Liu, Y.
Show abstract
Contingency Management (CM) is a psychological treatment that aims to change behavior with financial incentives. In substance use disorders (SUDs), deployment of CM has been enriched by longstanding discussions around the cost-effectiveness of prized-based and voucher-based approaches. In prize-based CM, participants earn draws to win prizes, including small incentives to reduce costs, and the number of draws escalates depending on the duration of maintenance of abstinence. In voucher-based CM, participants receive a predetermined voucher amount based on specific substance test results. While both types have enhanced treatment outcomes, there is room for improvement in their cost-effectiveness: the voucher-based system requires enduring financial investment; the prize-based system might sacrifice efficacy. Previous work in computational psychiatry of SUDs typically employs frameworks wherein participants make decisions to maximize their expected compensation. In contrast, we developed new frameworks that clinical decision-makers choose actions, CM structures, to reinforce the substance abstinence behavior of participants. We consider the choice of the voucher or prize to be a sequential decision, where there are two pivotal parameters: the prize probability for each draw and the escalation rule determining the number of draws. Recent advancements in Reinforcement Learning, more specifically, in off-policy evaluation, afforded techniques to estimate outcomes for different CM decision scenarios from observed clinical trial data. We searched CM schemas that maximized treatment outcomes with budget constraints. Using this framework, we analyzed data from the Clinical Trials Network to construct unbiased estimators on the effects of new CM schemas. Our results indicated that the optimal CM schema would be to strengthen reinforcement rapidly in the middle of the treatment course. Our estimated optimal CM policy improved treatment outcomes by 32% while maintaining costs. Our methods and results have broad applications in future clinical trial planning and translational investigations on the neurobiological basis of SUDs.
Mason, L.; Woelk, S.; Eldar, E.; Rutledge, R.
Show abstract
BackgroundIntuitively, emotional states guide not only the actions we take, but also our confidence in those actions. This sets the stage for subjective confidence about the best action to take to diverge from the actual likelihood and, clinically, may give rise to over-confidence and risky behaviours during episodes of elevated mood and the reverse during depressive episodes. Whilst computational models have been proposed to explain how emotional states recursively bias perception of action outcomes, these models have not been extended to capture the impacts of mood on confidence. Here we propose a computational model that formalises confidence and its relationship with learning from outcomes and emotional states. MethodsWe collected data both in a laboratory context (n=35) and in pre-registered online replication (n=106; https://osf.io/ygc4t). Participants completed a two-armed bandit task, with learning blocks before and after a mood manipulation in which participants unexpectedly received (positive mood induction) or lost (negative mood induction) a relatively large sum of money. Participants periodically reported their decision confidence throughout the task. We examined the extent to which the mood manipulation biased their confidence, predicting that positive and negative moods would lead to over- and under-confidence, respectively. We further predicted that this effect would be stronger in participants with greater propensity towards strong and changeable moods, measured by the Hypomanic Personality Scale. Moreover, we formalized a computational model in which confidence emerges as the difference between the perceived likelihood of reward for the available options. In this model, mood indirectly biases confidence through recursively biased learning of the reward likelihoods for the available options and not from simply shifting overall confidence up or down. ResultsIn both experiments, we confirmed that moods impacted confidence in the hypothesised direction; absent of any differences in participants objective performance, average confidence was higher following positive mood induction, and lower following negative mood induction. This effect was larger in participants with higher levels of trait hypomania. Intriguingly, we found that the effect of mood on confidence emerged in concert with learning. Indeed, whilst the shift in mood was greatest immediately post-mood manipulation and returned to baseline by the end of the learning block, the effect of mood on confidence gradually accumulated over learning trials, peaking at the end of the block. These dynamics were captured by simulations of a "Moody Likelihood" model. Empirically, this model simultaneously accounted for the effects of mood on choices, mood states and confidence through a mood bias parameter. ConclusionWe present a unified model in which moods recursively bias reward learning and, consequently, confidence in decision making. Moods fundamentally bias the accumulation of reward likelihood, rather than directly biasing decision confidence. Clinically, these findings have implications for understanding two core symptoms of mood disorder, suggesting that both perturbed mood and confidence about goal-directed behaviour arise from a common bias during reward learning.
Joyce, D. W.; Meyer, N.
Show abstract
Validated instruments such as questionnaires, patient-reported outcome measures and clinician-rated psychopathology scales, are indispensable for measuring symptom burden and mental state, and for defining outcomes in both psychiatric practice and clinical trials. Most often, the values on the instruments multiple items (dimensions) are added to derive a single, univariate (scalar) sum-score. Although this approach simplifies interpretation, there are always many possible combinations of individual items that can yield the same sum-score. Two patients can therefore obtain identical scores on a given instrument, despite having very different combinations of underlying item scores corresponding to different patterns of clinical symptoms. The same is also true when a single patient is measured at two different time points, where the resulting sum-scores can obscure changes that may be clinically meaningful. We present an alternative analytic framework, which leverages geometric concepts to represent measurements as points in a vector space. Using this framework, we show why sum-scores obscure information present in measurements of clinical state, and also provide a straightforward algorithm to mitigate against this problem. Clinically-relevant outcomes, such as remission or patient-centered treatment goals, can be represented intuitively, as reference points or anchors within this space. Using real-world data, we then demonstrate how measuring the relative distance between points and anchors preserves more information, allowing outcomes such as proximity to remission, to be defined and measured.
Talwar, A.; Cormack, F.; Huys, Q. J. M.; Roiser, J. P.
Show abstract
Risky decisions involve choosing between options where the outcomes are uncertain. Cognitive tasks such as the CANTAB Cambridge Gamble Task (CGT) have revealed that patients with depression make more conservative decisions, but the mechanisms of choice evaluation underlying such decisions, and how they lead to the observed differences in depression, remain unknown. To test this, we used a computational modelling approach in a broad general-population sample (N = 753) who performed the CANTAB CGT and completed questionnaires assessing symptoms of mental illness, including depression. We fit five different computational models to the data, including two novel ones, and found that a novel model that uses an inverse power function in the loss domain (contrary to standard Prospect Theory accounts), and is influenced by the probabilities but not the magnitudes of different outcomes, captures the characteristics of our dataset very well. Surprisingly, model parameters were not significantly associated with any mental health questionnaire scores, including depression scales; but they were related to demographic variables, particularly age, with stronger associations than typical model-agnostic task measures. This study showcases a new methodology to analyse data from CANTAB CGT, describes a noteworthy null finding with respect to mental health symptoms, and demonstrates the added precision that a computational approach can offer.
Gauld, C.; Depannemaecker, D.; Serre, F.; Auriacombe, M.
Show abstract
Substance Use Disorders (SUD) can be conceptualized as a prospective link from cues to craving and use. To explore the nonlinear relationships between craving and cues, this study applied dynamical systems theory (DST) to ecological momentary assessment (EMA) data. Optimized linear Seasonal Auto-Regressive Integrated Moving Average with eXogenous variable (SARIMAX) models were used to phenotype patients with SUD (alcohol, tobacco, cannabis, opiates, and cocaine), considering the potential for complex interactions between cue exposure and craving intensity in daily life. These phenotypic profiles were replicated in computational DST models to analyze the nonlinear interactions between cues, craving, and use. The study involved 211 individuals and 8,260 observations, with 154 patients fitting the SARIMAX model for the influence of cues on craving, and 57 patients fitting the SARIMAX model for a possible influence of craving on cues. Two DST models were adjusted to replicate the complex temporal dynamics of SUD based on these two directions of influence. The first DST model (adjusted to the influence of cues on craving) showed that an increase in cues leads to a rise in craving, which then diminishes both cues and craving itself, with use patterns following cravings trajectory. This patient profile is driven by a phenomenon of "maximum cue saturation". The second DST model (adjusted to the influence of craving on cues) demonstrated that an increase in craving was followed by an increase in cue reporting, leading to use, with use peaking and then reducing craving. This patient profile is characterized by a phenomenon of "maximum use saturation". Both models highlight craving as an essential modulator between cues and use, opening new therapeutic avenues.
Wei, M.; Zhang, H.; Peng, Q.
Show abstract
Background: Early initiation of substance use is linked to later adverse outcomes, and risk factors come from multiple domains and are shared across substances. In our previous work, traditional time-to-event Cox models identified individual risk factors, but these models are not designed to jointly model multiple outcomes or capture complex non-linear relationships. Multi-task learning (MTL) can leverage shared structure across related outcomes to improve prediction and distinguish common versus substance-specific predictors. However, most MTL studies rely on baseline features and focus on single outcomes, which limits their ability to capture shared risk and temporal changes. Substance use initiation is a time-dependent process that unfolds during development and reflects changing exposures over time. Baseline-only models cannot capture these changes or represent risk dynamics. Discrete-time modeling provides a practical approach by estimating interval-level initiation risk and combining it into cumulative risk at the subject level. By integrating multi-task learning with dynamic modeling, it is possible to share information across outcomes while capturing how risk evolves over time, which may improve prediction performance. Methods: Using the Adolescent Brain Cognitive Development (ABCD) Study (release 5.1), we developed two complementary multi-task learning (MTL) frameworks to predict initiation of alcohol, nicotine, cannabis, and any substance use. A baseline MTL model predicted fixed- horizon (48-month) initiation using one record per participant, while a dynamic discrete-time MTL model incorporated longitudinal interval data to model time-varying risk. Both models used multi-domain environmental exposures, core covariates, and polygenic risk scores (PRS). Performance was evaluated on a held-out test set using AUROC, PR-AUC, and calibration metrics, and compared with single-task logistic regression (LR). Feature importance was assessed using permutation importance and compared with Cox proportional hazards models. Results: MTL showed comparable or improved performance relative to LR, with larger gains for low-prevalence outcomes (cannabis and nicotine). Incorporating longitudinal information led to consistent improvements across all outcomes. Dynamic models increased AUROC by +0.044 to +0.062 for MTL and +0.050 to +0.084 for LR, indicating that temporal information was the primary driver of performance gains. Feature importance analyses showed modest overlap across methods, with higher agreement between dynamic MTL and Cox models than static MTL. A small set of features, including externalizing behavior, parental monitoring, and developmental factors, were consistently identified across all approaches. Conclusions: Dynamic multi-task learning improves the prediction of substance use initiation by leveraging longitudinal structure and shared information across outcomes. While MTL provides additional gains, incorporating time-varying information is the dominant factor for improving performance. Combining baseline and dynamic frameworks offers a comprehensive strategy for identifying robust risk factors and modeling adolescent substance use initiation.
Kalhan, S.; Schwartenbeck, P.; Hester, R.; Garrido, M. I.
Show abstract
Adaptive behaviours depend on dynamically updating internal representations of the world based on the ever-changing environmental contingencies. People with a substance use disorder (pSUD) show maladaptive behaviours with high persistence in drug-taking, despite severe negative consequences. We recently proposed a salience misattribution model for addiction (SMMA; Kalhan et al., (2021)), arguing that pSUD have aberrations in their updating processes where drug cues are misattributed as strong predictors of positive outcomes, but weaker predictors of negative outcomes. We also argue that conversely, non-drug cues are misattributed as weak predictors of positive outcomes, but stronger predictors of negative outcomes. However, these hypotheses need to be empirically tested. Here we used a multi-cue reversal learning task, with reversals in whether drug or non-drug cues are currently relevant in predicting the outcome (monetary win or loss). We show that compared to controls, people with a tobacco use disorder (pTUD), do form misaligned internal representations. We found that pTUD updated less towards learning the drug cues relevance in predicting a loss. Further, when neither drug nor non-drug cue predicted a win, pTUD updated more towards the drug cue being relevant predictors of that win. Our Bayesian belief updating model revealed that pTUD had a low estimated likelihood of non-drug cues being predictors of wins, compared to drug cues, which drove the misaligned updating. Overall, several hypotheses of the SMMA were supported, but not all. Our results implicate that strengthening the non-drug cue association with positive outcomes may help restore the misaligned internal representation in pTUD.
Delawalla, C.; Zindel, L.; Jung, S.; Cohen, A. O.; Waldman, I.; Palmer, R. H. C.
Show abstract
Alcohol use disorder (AUD) is heterogenous, and criteria can be met through several expressions of harmful use (Watts et al., 2021) ranging from frequent binge drinking episodes and low-grade excessive consumption (e.g., 2 drinks every evening), to everyday use associated with neurobiological changes seen in active addiction. Precise understanding of use patterns and the cognitive and emotional experiences that drive these patterns are key in both a research context, where neurobiological, genetic, and personality factors may differ amongst use patterns (e.g., Delawalla et al., 2023) and in a clinical intervention context, where intervention for one use pattern may look very different from another (Nadkarni et al., 2022). We derived a four-factor model of compulsive alcohol use (CAU) in a representative sample of N = 2,004 Americans. The CAU model suggests the construct is comprised of Intrusive Thoughts, Emotionality, Craving, and Loss of Control. Further, we validated the emergent models against collateral measures to verify the structure and predictive value.
de Lacy, N.; Ramshaw, M.
Show abstract
Internalizing disorders (depression, anxiety, somatic symptom disorder) are among the most common mental health conditions that can substantially reduce daily life function. Early adolescence is an important developmental stage for the increase in prevalence of internalizing disorders and understanding specific factors that predict their onset may be germane to intervention and prevention strategies. We analyzed [~]6,000 candidate predictors from multiple knowledge domains (cognitive, psychosocial, neural, biological) contributed by children of late elementary school age (9-10 yrs) and their parents in the ABCD cohort to construct individual-level models predicting the later (11-12 yrs) onset of depression, anxiety and somatic symptom disorder using deep learning with artificial neural networks. Deep learning was guided by an evolutionary algorithm that jointly performed optimization across hyperparameters and automated feature selection, allowing more candidate predictors and a wider variety of predictor types to be analyzed than the largest previous comparable machine learning studies. We found that the future onset of internalizing disorders could be robustly predicted in early adolescence with AUROCs [≥][~]0.90 and [≥][~]80% accuracy. Each disorder had a specific set of predictors, though parent problem behavioral traits and sleep disturbances represented cross-cutting themes. Additional computational experiments revealed that psychosocial predictors were more important to predicting early adolescent internalizing disorders than cognitive, neural or biological factors and generated models with better performance. We also observed that the accuracy of individual-level models was highly correlated to the relative importance of their constituent predictors, suggesting that principled searches for predictors with higher importance or effect sizes could support the construction of more accurate individual-level models of internalizing disorders. Future work, including replication in additional datasets, will help test the generalizability of our findings and explore their application to other stages in human development and mental health conditions.
Dirupo, G.; Westwater, M. L.; Khaikin, S.; Feder, A.; DePierro, J. M.; Charney, D. S.; Murrough, J. W.; Morris, L. S.
Show abstract
Deficits in inhibitory control are common across a wide range of psychiatric disorders and are closely linked to symptom severity, including emotional dysregulation, anxiety, substance misuse, and self-harm, making them an appealing target for intervention. Cognitive training offers a low-cost, scalable, and non-invasive strategy to strengthen inhibitory control; however, most existing paradigms target only a single facet of inhibition and rarely account for environmental influences, such as affective context. To address these gaps, we developed a computerized inhibitory control training paradigm to simultaneously engage three components of inhibition: preemptive, proactive, and reactive, while embedding trials within positive and negative affective contexts to assess the impact of emotional stimuli. Across two online experiments, participants completed the GAMBIT task in one session (Experiment 1, N = 300) or repeated over three sessions (Experiment 2, N = 65). The task included No-Go trials to train preemptive inhibition, stop-signal trials for reactive inhibition, and stop-signal anticipation trials to train proactive inhibition. Affective images of differing valence were presented as background stimuli to evaluate their impact on inhibitory performance. In Experiment 1, participants showed higher accuracy on No-Go versus reference Go trials ({beta}=1.45, SE=0.09, p<.001), confirming successful manipulation of preemptive inhibition. Reaction times were slower during anticipation trials across two different conditions ({beta}=0.16, SE=0.04, p<.001; {beta} = 0.07, SE = 0.04, p = 0.047), consistent with proactive slowing when anticipating a potential stop signal. Additionally, positive affective images ({beta} = 0.10, SE= 0.009, p < 0.001) further slowed RTs, indicating emotional interference with proactive control. In Experiment 2, the pattern of higher No-Go accuracy was replicated ({beta} = 0.91, SE = 0.11, p < .001) and accuracy generally improved over sessions ({beta} = 0.38, SE = 0.06, p < .001). In anticipation trials, RTs become shorter across sessions (session 2: {beta} = -0.25, SE = 0.06, p < .001; session 3: {beta} = -0.45, SE = 0.06, p < .001), reflecting practice-related gains, and SSRTs decreased over time (F(2,56) = 6.26, p = .004), consistent with enhanced reactive inhibition. Proactive inhibition was modulated by affective images, with both negative ({beta} = 0.04, SE = 0.02, p = .039) and positive ({beta} = 0.16, SE = 0.02, p < .001) affective images associated with slower RTs. Participants also reported reductions in self-assessed temper control by the last session (W = 25.5, p = .007, q = .037, d = -0.51) and usability ratings were high (all means [≥] 3.87/5). Together, these findings show that this paradigm recruits multiple forms of inhibitory control and yields training-related improvements in both performance and affective outcomes. This provides preliminary validation of a scalable, fully online inhibitory control training tool targeting multiple dissociable inhibitory processes within affective contexts. The approach holds promise as an accessible transdiagnostic intervention to support symptom improvement across psychiatric disorders, with future work needed to evaluate clinical efficacy in patient populations.
Molitor, J.; Peters, J.
Show abstract
Background and aimsGambling-related cognitive distortions (GRCD) are closely linked to problem gambling symptom severity (PGSI) and are associated with superstitious and delusional ideation, as well as indirectly with conspiracy beliefs. Although conceptually distinct, these different belief domains share core features (e.g., erroneous beliefs about causal structure) and may be related to compulsivity. The latent factor structure underlying these belief domains is poorly understood. This preregistered study examined a potentially shared latent structure underlying GRCD, superstitious and delusional ideation, and conspiracy beliefs, and links with PGSI and a transdiagnostic symptom dimension previously linked to addictive and compulsive psychopathology. MethodsParticipants with previous gambling experience (N = 491) were recruited via Prolific (C) and completed measures assessing each belief domain, along with dimensions of compulsive behavior and intrusive thought, anxious-depression, and social withdrawal. Several factor analytic models were compared to determine the optimal latent structure. ResultsPGSI was significantly correlated with all irrational belief domains. Model comparison favored a bifactor model comprising a general factor accounting for over half of the variance (55%, closely aligned with superstitious content) and domain-specific factors related to GRCD and conspiracy beliefs. The association between compulsivity and PGSI was partially mediated by GRCDs, suggesting compulsivity may affect PGSI both directly and indirectly via a modulation of GRCDs. ConclusionsFindings confirm that problem gambling is associated with irrational beliefs beyond GRCDs and support a dimensional neurocognitive model in which general and domain-specific belief components are linked to compulsivity. This suggests shared mechanisms underlying the formation and persistence of irrational beliefs across domains, with unique features specific to GRCD.
Sun, J.; Ni, Y.; Li, J.
Show abstract
Evidence for positivity and optimism bias abounds in high-level belief updates. However, no consensus has been reached regarding whether learning asymmetries exists in more elementary forms of updates such as reinforcement learning (RL). In RL, the learning asymmetry concerns the sensitivity difference in incorporating positive and negative prediction errors (PE) into value estimation, namely the asymmetry of learning rates associated with positive and negative PEs. Although RL has been established as a canonical framework in interpreting agent and environment interactions, the direction of the learning rate asymmetry remains controversial. Here, we propose that part of the controversy stems from the fact that people may have different value expectations before entering the learning environment. Such default value expectation influences how PEs are calculated and consequently biases subjects choices. We test this hypothesis in two learning experiments with stable or varying reinforcement probabilities, across monetary gains, losses and gain-loss mixtures environments. Our results consistently support the model incorporating asymmetric learning rates and initial value expectation, highlighting the role of initial expectation in value update and choice preference. Further simulation and model parameter recovery analyses confirm the unique contribution of initial value expectation in accessing learning rate asymmetry. Author SummaryWhile RL model has long been applied in modeling learning behavior, where value update stands in the core of the learning process, it remains controversial whether and how learning is biased when updating from positive and negative PEs. Here, through model comparison, simulation and recovery analyses, we show that accurate identification of learning asymmetry is contingent on taking into account of subjects default value expectation in both monetary gain and loss environments. Our results stress the importance of initial expectation specification, especially in studies investigating learning asymmetry.
Stolicyn, A.; Romaniuk, L.; Lawrie, S. M.; Series, P.
Show abstract
BackgroundCognitive deficits are a common symptom of depression and contribute significantly to the disabling effects of the disorder. Experimentally, they are observed as increased reaction times, increased error rates, and deficient performance adaptation after making errors or receiving adverse feedback, in multiple cognitive paradigms. In the current theoretical study we aimed to address the cause of these cognitive deficits. MethodsWe constructed computational models of optimal resource allocation in two cognitive tasks - Delayed Match to Sample (DMS), and Eriksen Flanker (EF). The models explicitly link performance feedback values and beliefs about task controllability with measures of cognitive performance including accuracy, reaction times, and post-error improvement in accuracy (PIA). We then introduced depression-related motivational changes - altered control belief and feedback values (representing learned helplessness, anhedonic valuation and negative bias) - to see if these factors can account for deficits in cognitive performance. ResultsIn the DMS task, altered control belief and lower valuation of correct performance accounted for decreased accuracy and decreased PIA. In the EF task, altered control belief and lower correct performance valuation could explain increased response times, decreased accuracy and decreased error-related negativity (ERN) signal. Increased valuation of adverse feedback, on the other hand, was linked to increased accuracy and the ERN signal. Furthermore, in the EF task, different combinations of depression-related motivational factors led to different patterns of cognitive performance, which could offer a basis for stratification. ConclusionsOur models offer an explicit computational and algorithmic bridge between the known depression-related motivation factors (learned helplessness, anhedonic valuation) and commonly observed cognitive deficits (increased reaction times, decreased performance accuracy, worse post-error adaptation), which contributes towards a better understanding of depression.
Desender, K.; Verguts, T.
Show abstract
Reinforcement learning models describe how agents learn about the world (value learning), and how they interact with their environment based on the learned information (decision policy). As in any optimization problem, it is important to set the process hyperparameters, a process which also is thought to be learned (meta-learning). Here, we test a key prediction of meta-learning frameworks, namely that there exist one or more meta-signals that govern hyperparameter setting. Specifically, we test whether decision confidence, in a context of varying outcome variability, informs hyperparameter setting. Participants performed a 2-armed bandit task with confidence ratings. Model comparison shows that confidence and outcome variability are differentially involved in hyperparameter setting. A high level of confidence in the previous choice decreased hyperparameter setting of decision noise on the current trial: when a trial was made with low confidence, the choice on the next trial tended to be more explorative (i.e. high decision noise). Outcome variability influenced another hyperparameter, the learning rate for positive prediction errors (thus affecting value learning). Both strategies are rational approaches that maximize earnings at different temporal loci: the modulation by confidence causes more frequent exploration early after a change point, the modulation by outcome variability is advantageous late after a change point. Finally, we show that (reported) confidence in value-based choices reflects the action value of the chosen option (irrespective of the unchosen value). In sum, decision confidence and outcome variability reflect distinct signals that optimally guide the setting of hyperparameters in decision policy and value learning, respectively.
Cheng, Z.; Ging-Jehli, N. R.; Tarlow, M.; Kim, J.; Chase, H. W.; Arora, M.; Bonar, L.; Stiffler, R.; Grattery, A.; Graur, S.; Frank, M. J.; Phillips, M. L.; Shenhav, A.
Show abstract
To behave adaptively, people need to integrate information about probabilistic outcomes and balance drives to approach positive outcomes and avoid negative outcomes. However, questions remain about how uncertainty in positive and negative outcomes influence approach-avoid decision-making dynamics. To fill this gap, we developed a novel Probabilistic Approach Avoidance Task (PAAT) and characterized behavior in this task using sequential sampling models In this task, participants (Study 1: blinded mixed clinical sample N=34; Study 2: online nonpsychiatric sample N = 58) made a series of choices between pairs of options, each consisting of variable probabilities of reaching a positive outcome (monetary reward) and of reaching a negative outcome (aversive image). Participants tended to choose options that maximized the likelihood of reward and minimized the likelihood of aversive outcomes. Moreover, the weights they placed on each of these differed for choices where these likelihoods were in opposition (i.e., the riskier option was also more rewarding; incongruent trials) relative to when these were aligned (congruent trials). Computational modeling revealed that the relative influence of rewarding and aversive outcomes on choice was captured by differences in the rate of decision-relevant information accumulation. These modeling results were validated with a series of model comparisons and posterior predictive checks, demonstrating that our sequential sampling models reliably captured our behavioral data. Together, these findings improve our understanding of the influence of motivational conflict, outcome type, and levels of uncertainty on approach-avoid decision-making.
Satti, M. H.; Wille, K.; Nassar, M. R.; Cichy, R. M.; Schuck, N. W.; Dayan, P.; Bruckner, R.
Show abstract
Difficulties in adapting learning to meet the challenges of uncertain and changing environments are widely thought to play a central role in internalizing psychopathology, including anxiety and depression. This view stems from findings linking trait anxiety and transdiagnostic internalizing symptoms to learning impairments in laboratory tasks often used as proxies for real-world behavioral flexibility. These tasks typically require learners to adjust learning rates dynamically in response to uncertainty, for instance, increasing learning from prediction errors in volatile environments. However, prior studies have produced inconsistent and sometimes contradictory findings regarding the nature and extent of learning impairments in populations with internalizing disorders. To address this, we conducted eight experiments (N = 820) using predictive inference and reversal learning tasks, and applied a bi-factor analysis to capture internalizing symptom variance shared across and differentiated between anxiety and depression. While we observed robust evidence for adaptive learning-rate modulation across participants, we found no convincing evidence of a systematic relationship between internalizing symptoms and either learning rates or task performance. These findings challenge prominent claims that learning difficulties are a hallmark feature of internalizing psychopathology and suggest that the relationship between these traits and adaptive behavior under uncertainty may be more subtle than previously thought.
Dubois, M.; Aislinn, B.; Moses-Payne, M. E.; Habicht, J.; Steinbeis, N.; Hauser, T. U.
Show abstract
During childhood and adolescence, exploring the unknown is important to build a better model of the world. This means that youths have to regularly solve the exploration-exploitation trade-off, a dilemma in which adults are known to deploy a mixture of computationally light and heavy exploration strategies. In this developmental study, we investigated how youths (aged 8 to 17) performed an exploration task that allows us to dissociate these different exploration strategies. Using computational modelling, we demonstrate that tabula-rasa exploration, a computationally light exploration heuristic, is used to a higher degree in children and younger adolescents compared to older adolescents. Additionally, we show that this tabula-rasa exploration is more extensively used by youths with high attention-deficit/hyperactivity disorder (ADHD) traits. In the light of ongoing brain development, our findings show that children and younger adolescents use computationally less burdensome strategies, but that an excessive use thereof might be a risk for mental health conditions.
Salomon, T.; Itzkovitch, A.; Daw, N. D.; Schonberg, T.
Show abstract
Cue-Approach Training (CAT) is a paradigm that enhances preferences without external reinforcmeents, suggesting a potential role for internal learning processes. Here, we developed a novel Bayesian computational model to quantify anticipatory response patterns during the training phase of CAT. This phase includes individual items and thus this marker is potentially of internal learning signals at the item level. Our model, fitted to meta-analysis data from 29 prior CAT experiments, was able to predict individual differences in non-reinforced preference changes using a key computational marker. Crucially, two new experiments manipulated the training procedure to influence the models predicted learning marker. As predicted and preregistered, the manipulation successfully induced differential preference changes, supporting a causal role of our model. These findings demonstrate powerful potential of our computational framework for investigating intrinsic learning processes. This framework could be used to predict preference changes and opens new avenues for understanding intrinsic motivation and decision-making. TeaserBayesian modeling of response time predicts individual differences in non reinforced preference change.